PARTICLE METHODS FOR STOCHASTIC OPTIMAL CONTROL 

PROBLEMS 



PIERRE CARPENTIER *, GUY COHEN t, AND ANES DALLAGI * 
Abstract. 

When dealing with numerical solution of stochastic optimal control problems, stochastic dynamic 
programming is the natural framework. In order to try to overcome the so-called curse of dimension- 
ality, the stochastic programming school promoted another approach based on scenario trees which 
can be seen as the combination of Monte Carlo sampling ideas on the one hand, and of a heuristic 
technique to handle causality (or nonanticipativeness) constraints on the other hand. 

However, if one considers that the solution of a stochastic optimal control problem is a feedback 
law which relates control to state variables, the numerical resolution of the optimization problem 
over a scenario tree should be completed by a feedback synthesis stage in which, at each time step of 
the scenario tree, control values at nodes are plotted against corresponding state values to provide 
a first discrete shape of this feedback law from which a continuous function can be finally inferred. 
From this point of view, the scenario tree approach faces an important difficulty: at the first time 
stages (close to the tree root), there are a few nodes (or Monte-Carlo particles), and therefore a 
relatively scarce amount of information to guess a feedback law, but this information is generally of 
a good quality (that is, viewed as a set of control value estimates for some particular state values, it 
has a small variance because the future of those nodes is rich enough) ; on the contrary, at the final 
time stages (near the tree leaves), the number of nodes increases but the variance gets large because 
the future of each node gets poor (and sometimes even deterministic). 

After this dilemma has been confirmed by numerical experiments, we have tried to derive new 
variational approaches. First of all, two different formulations of the essential constraint of nonan- 
ticipativeness are considered: one is called algebraic and the other one is called functional. Next, in 
both settings, we obtain optimality conditions for the corresponding optimal control problem. For 
the numerical resolution of those optimality conditions, an adaptive mesh discretization method is 
used in the state space in order to provide information for feedback synthesis. This mesh is naturally 
derived from a bunch of sample noise trajectories which need not to be put into the form of a tree 
prior to numerical resolution. In particular, an important consequence of this discrepancy with the 
scenario tree approach is that the same number of nodes (or points) are available from the beginning 
to the end of the time horizon. And this will be obtained without sacrifying the quality of the results 
(that is, the variance of the estimates). Results of experiments with a hydro-electric dam production 
management problem will be presented and will demonstrate the claimed improvements. 

Key words, stochastic programming; measurability constraints; discretization 

AMS subject classifications. 90Cf5, 49M25, 62L20 

Introduction. Taking into account uncertainties in the decision process has be- 
come an important issue for all industries. Facing the market volatilities, the weather 
whims and the changing policies and regulatory constraints, decision makers have to 
find an optimal way to introduce their decision process into this uncertain framework. 
One way to take uncertainties into account is to use the stochastic optimization frame- 
work. In this approach, the decision maker makes his decision by optimizing a mean 
value with regard to the multiple possible scenarios weighted with a probability law. 

Stochastic optimization problems often involve information constraints: the deci- 
sion maker makes his decision after getting observations about the possible scenarios. 
In the literature, two different communities have dealt with this information issue 
using different modelling techniques, and therefore different solution methods. The 
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question is: knowing that stochastic optimization problems are often infinite dimen- 
sional problems, how can we implement tractable solution methods (which can be 
implemented using a computer)? To answer this question each community brought 
its own answers. 

The stochastic programming community models the information structure by sce- 
narios trees: this involves Monte Carlo sampling plus some manipulations of the sam- 
ple trajectories to enforce the tree structure. Then, tractable solution methods for 
stochastic optimization problems consist in solving deterministic problems over the 
decision trees. We refer to [T51 [T^l HD] for further details on these methods. Neverthe- 
less, stochastic programming faces an important difficulty: due to the tree structure, 
one has a few discretization points (nodes of the tree) at the early stages, which could 
represent a serious handicap when attempting to synthesizing a feedback law. On the 
contrary, when approaching the last stages, one has a large number of discretization 
points but with a future which may be almost or completely deterministic. 

The stochastic optimal control community uses special structures of the stochastic 
optimization problems (time sequentiality and state notions) to model the information 
constraints through a functional interpretation that leads to the Dynamic Program- 
ming Principle. We refer to [4j [5j [6] for further details on this method. However, 
stochastic dynamic programming is also confronted with a serious obstacle known as 
the "curse of dimensionality" . In fact, this method leads one to blindly discretize the 
whole state space without taking into account a, generally nonuniform, state distri- 
bution at the optimal solution. 

This paper tries to bridge the gap between those two communities. We propose 
a tractable solution method for stochastic optimal control problems which makes use 
of the good ideas of both Monte Carlo sampling and variational methods on the one 
hand, functional handling of the information structure on the other hand. Other 
related works [HI HO] are still closer to the stochastic programming point of view. In 
our approach, the same number of discretization (sample) points are used from the 
beginning to the end of the time horizon, and this discretization grid is adaptive: 
when transposed into the state space, it reflects the optimal state distribution at each 
time stage. 

The proposed method is of a variational nature in that it is based on gradient 
calculations which involve the forward state variable integration and the backward 
adjoint state (co-state) evaluation as in the Pontryagin minimum principle. The 
purpose is to solve the Kuhn- Tucker necessary optimality conditions of the considered 
problem (including the information constraints). 

This paper is organized as follows. In iJTJ we discuss how to model the infor- 
mation structure in stochastic optimization problems. Then we derive optimality 
conditions for stochastic optimization problems with information constraints. In Ej2j 
these optimality conditions are specialized to the situation of stochastic optimal con- 
trol problems. A special attention is paid to the so-called Markovian case in H2A{ 
The numerical implementation of a resolution method based on Monte Carlo and 
functional approximations is presented in Finally, in 2J a case study, namely a 
hydro-electric dam management problem, illustrates the proposed method, and com- 
pares it to the standard scenario tree approach. 

1. Preliminaries. In this section, we present the main framework of this paper: 
how to model stochastic optimization problems and how to represent the information 
structure of such problems. This preliminary section is directly inspired from the 
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works of the System and Optimization Working Group3 [3J [TH1 [H] . 

In the sequel of the paper, the random variables, defined over a probability space 
(CI, A, P), will be denoted using bold letters (e.g. £ 6 L 2 (Cl, A, P; S)) whereas their 
realizations will be denoted using normal letters (e.g. ^ 6 a). 

1.1. Modelling information. When unknown factors affect a process that we 
try to control, we consider a set CI of possible states of nature uj among which the true 
state ujq is supposed to be. Roughly speaking, an information structure is a partition 
of CI into a collection of subsets 67. A posteriori observations will help us to determine 
in which particular subset of this partition the true state uq lies. Two extreme cases 
may be mentioned. 

1. If the partition is simply the crude partition {0,51}, then the a posteriori 
observations are useless. We will refer to this situation as an open-loop infor- 
mation structure. 

2. If the partition is the finest possible one, namely all subsets G are singletons, 
then the a posteriori observations will tell us exactly what is the true state 
of nature too. This is the situation of perfect knowledge prior to making our 
decisions. 

In between, after the a posteriori observations become available, we will remain with 
some uncertainty about the true ujq, that is we will know in which particular G 
the true state lies but all a/s in that G are still possible. Moreover, in dynamic 
situations, new observations may become available at each time stage and a partition 
of CI corresponding to this information structure must be considered at each time 
stage. We refer to the general case as a closed-loop decision process. 

In order to have a framework in which various operations on information struc- 
tures become possible, probability theory introduces so-called a-algebras or cr-fields, 
random variables and a pre-order relation between random variables. The latter is 
called measurability^ 

We refer the reader to jT7j for further details on information structure and prob- 
ability theory. As far as this paper is concerned, we here state a result found in [171 
Theorem 8 p. 108] giving the main properties of the measurability relation between 
random variables. 

PROPOSITION 1.1 (Measurability relation) . Let (CI, A, P) be a probability space, 
let Y. : CI — ► Yi and Y 2 : CI — > Y2 be two random variables taking their values in Yi 
and Y2 respectively. The following statements are equivalent: 

1. Yi 1 Y 2 , 

2. a(Y,) C a(Y 2 ), 

3. 3(j) ■ Y2 — > Yi, a measurable mapping such that Y, = (f> o Y 2 , 

4. Y l = E(Y 1 I Y 2 ), F-a.s.. 

1.2. Modelling a stochastic optimization problem. In this section, we con- 
sider two interpretations of a stochastic optimization problem: an algebraic one in 
which the information constraint is modelled through the standard measurability rela- 
tion (Statement 1 of Proposition [TTT]) , and a functional interpretation in which we use 



1 SOWG, Ecole Nationale des Ponts et Chaussees: Laetitia Andrieu, Kengy Barty, Pierre Car- 
pentier, Jean-Philippe Chancelier, Guy Cohen, Anes Dallagi, Michel De Lara, Pierre Girardeau, 
Babakar Seek, Cyrille Strugarek. 

2 A random variable Y 1 is measurable with respect to another random variable Y 2 if and only if 
the er-field generated by the first is included into the one generated by the second: cr(Y L ) C cr(Y 2 )- 
In this paper, we will not give further details on these definitions. Instead, we refer the reader to [7] 
for further details on the probability and measurability theory. 
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the functional equivalent of measurability relation (Statement 3 of Proposition [TTT]) . 

1.2.1. Algebraic interpretation. Let (fl,A,F) be a probability space and let 
^ : fl -> H be a random variable taking values in 5 := R d « (noise space). We denote 
by U :— R d " the control space and by U the functional space L 2 (fl, A, P; U). The cost 
function J : U —* R is defined as the expectation of a normal integran j : U x 3 — > R 

J(U) :=E(i(E7,C))= / j([/M,£H)dPM. 

The feasible set U fc := W as (~1 W mo accounts for two different constraints: 

• almost sure constraints: 

U as :={UeU, U{oj) G T as (w), P-a.s.}, (1.1) 

where r as : =^ U is a measurable set-valued mapping (see [TS1 Defini- 
tion 14.1]) which is convex and closed valued, and 

• measurability constraints: 

U mc := {U E U, U is S-measurable} , (1.2) 

where S is a given sub-cr-field of A. 
From their respective definitions, it is easy to prove that U mc is a closed subspace of 
U (indeed L 2 {n, S, P; U)) and that U aa is a closed convex subset of U. 

The optimization problem under consideration is to minimize the cost function 
J(U) over the feasible subset U ic . The first model representing the stochastic opti- 
mization problem is thus 

min J(U) s.t. U eU tc . (1.3) 
ueu 

This interpretation is called algebraic as it uses an algebraic relation (measurability) 
to define the information structure of Problem (|1.3[) . 

1.2.2. Functional interpretation. According to Proposition ll.il a functional 
model for stochastic optimization problems is available, in which the optimization is 
achieved with respect to functions called feedbacks. Indeed, let 

• Y : — > Y be a random variable (called the observation) taking value in 

Y := R rf », 

• $ be the space L 2 (Y, S° , ¥ Y ; U) where £° is the Borel cr-field of Y and F Y 
the image of the probability measure P by Y , 

• <l> as be the subset of $ defined by 

$ as :={0 e $, (f>oY(uj) e T as (w), P-a.s.}. 



We define the cost function J : $ -> R as J(<j>) := E\J (</>(Y) , £) ) . We arc interested 
in the functional optimization problem: 

minJ(0) s.t. 0€$ as . (1.4) 

0G$ 



3 See |18l Definition 14.27]. We remind that the normal integrand assumption is done to ensure 
measurability properties 1181 Proposition 14.28]. 
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Proposition 1.2. Ifa(Y) = S, then Problem (|1.3[) is equivalent to Problem f| 1 .4[) 
in the sense that if 17' is solution of (jl.3|) (resp. 0* solution of (| 1 .4|) j. i/ien £/iere 
existe B solution of (fL4) ("resp. [/" solution of swc/i thatU^ = <ft(Y), P-a.s.. 

Proof. This is a straightforward consequence of Proposition 11.11 and of the defini- 
tion of the feasible sets in both problems. □ 

1.3. Optimality conditions for a stochastic optimization problem. Pre- 
vious works dealt with optimality conditions for stochastic optimal control problems 
[IHEI]. This paragraph can be viewed as a slight extension of [TS] when considering a 
stochastic optimization problem subject to almost-sure and measurability constraints. 

We consider Problem (|1.3p presented in m.2{ We recall that the feasible set U is 
the intersection of a closed convex subset W as and of a linear subspace U mo . In order 
to apply the results given in Appendix [XJ we first establish the following lemma. 

Lemma 1.3. LetU aB be the convex set defined by almost sure constraints (11. ip 
and let U mc be the linear subspace defined by measurability constraints (II. 2p . We 
assume that T as is a %-mesurable and closed convex valued mapping. Then 

proj^ B (U aic ) C U™. 

Proof. We first prove that (proj Waa (U) )(w) = proj r as, j (U(u>)) , P-a.s.. In- 
deed, let F(V) :— \ || V — U\\ u + X u ^ 00* F rom the definition of almost sure con- 
straints, we have F(V) = J n f(V {v), w)dP(w), with f{v,oj) := ±\\v - U{w)\\^ + 
X r as ( j ( v )- By definition of the projection, proj^as ([/) is solution of the optimization 
problem 

min / /(VH,w)dP(«). 

Using [181 Theorem 14.60] (interchange of minimization and integration), we obtain 
(proj W a B (U) )H S argmin/(u,w) = {proj r as, } (U(uj)) }, P-a.s., (1.5) 

hence the claimed property. 

Let us consider any U 6 W mc . We deduce from (|1.5|) that 

1 II l|2 

(proj WaB (C7) )(w) = argmin - ||v - t7(w)|| , P-a.s.. 

wer as (w) 2 

From [T] Theorem 8.2.11] (measurability of marginal functions), we deduce from the 
3-measurability of both U and r as that the argmin function proj^as (U) is also a 
3-measurable function, which means that proj Waa (U) G U mc . □ 

The main result of this section is given by the following theorem. 

Theorem 1.4. Assume that T as is a ^-mesurable and closed convex valued map- 
ping, that function j is a normal integrand such that is differ entiable P-a.s. and 
that j' u (U(-),£(-)) £ U for all U e U. Let C/ J be a solution of Problem (TT3|). Then 

HfuiUKZ) | S) e-9 XuM (t/»). (1.6) 
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Proof. The differentiability of J : U — > K is a straightforward consequence of both 
the integral expression of J and the differentiability assumption on j. Moreover the 
expression of the derivative of J is given by 

j'(U)(u)=f u (U(u),S(u;)), P-a.s.. 

Let IP be a solution of (|1.3p . From Lemma [I~3l and Proposition lA.2l in the Appendix, 
we obtain 

proj wm e (/([/«)) G -5 Xuas (C/«), 

this last expression being equivalent to (|1.6p thanks to the characterization of the 
conditional expectation as a projection and the expression of J'. □ 

Using the equivalent statements (|A.2cp , the optimality conditions given by The- 
orem 11.41 can be reformulated as 

U* = Proj„ as (U* - e P roj Wmc (VJ(C/»))) , Ve > 0. 

This last formulation can be used in practice to solve Problem (|1.3[) using a projected 
gradient algorithm. The projection over W as involves random variables, but we saw 
in the proof of Lemma 11.31 that proj^as can be performed in a pointwise manner (to 
per u>), so that the implementation of the algorithm is effective. 

Remark 1. The optimality conditions (II. 6|) . which are given in terms of random 
variables, also have a pointwise interpretation. As a matter of fact, using the equiva- 
lent statements (|A.2|) and the pointwise interpretation of proj Waa , it is straightforward 
to prove that R E dx U a.<s(U) implies R(u>) £ <9x raS( (U(u>)), P-a.s.. 

2. Stochastic optimal control problems. Stochastic optimal control prob- 
lems rest upon the same framework as stochastic optimization problems: they can be 
modelled as closed-loop stochastic optimization problems with a sequential time struc- 
ture. In this section, we deal with the algebraic interpretation of stochastic optimal 
control problems in a discrete time framework. We will derive optimality conditions 
following the same principle as in §1.31 

2.1. Problem formulation. We consider a stochastic optimal control problem 
in discrete time, T denoting the time horizon. At each stage t = 0, . . . , T, we denote 
by W* := R^™* the noise space at time t. Let Wt be a set of random variables defined 
on (fi, A, P) and taking values in Wt, and let W := Wo x • • • x Wr- The noise process 
of the problem is a random vector W E W such that W — (W Q , . . . , W T ), with 
W t E W t . 

At each stage t — 0, . . . , T — 1, we denote by Ut := the control space and by 
lA t '■= L 2 (fl,A, P; Ut) the space of square integrable random variables taking values in 
Uf. At t, the decision maker makes a decision (a control) U t E Ut- Let U := Uq x • ■ ■ x 
Ut-i- The control process of the problem is a random vector U E U such that U = 
(CTq, . . . , U T _ 1 ), with U t (zUt- We also assume that each control variable U t is subject 
to almost-sure constraints. More precisely, for allt = 0,...,T — 1, let T^ s : £1 =t Ut 
be a set-valued mapping (random set), let Uf s := {U t E U t , U t (ui) E r as (w), P-a.s.} 
and let U as := Uq s x • • • x U^_ x . The almost-sure constraints writes 



u t ewf, w = o,...,r-i. 



(2.1) 
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2.1.1. Dynamics. At each stage t = Q,...,T, we denote by X t := the 
state space at time t. Let X t be a set of random variables defined on (Q,A,¥) and 
taking values in X t , and let X := Xq x . . . x Xt- The state process of the problem is 
a random vector X £ X such that X = (X , . . . , X T ), with X t £ X t . It arises from 
the dynamics ft : X t x Ut x Wt+i — > X t+ i of the system, namely 

X Q = W , P-a.s., (2.2a) 
X t+1 = f t (X t ,U t ,W t+1 ), P-a.s. Vt = 0,...,T-l. (2.2b) 



Remark 2. TTie s/ii/t on the time index between the random variable X t and the 
control U t on the one hand and the noise W t , j on the other hand enlightens the fact 
that we are in the so-called decision-hazard scheme: at time t, the decision maker 
chooses a control variable U t before having any information about the noise W t+1 
which affects the dynamics of the system. 

2.1.2. Cost function. We consider at each stage t = 0, . . . , T— 1, an "integral" 
cost function L t : X t x U* x Wt+i — > M. Moreover, at the final stage T, we consider a 
final cost function K : Xy — > K. Summing up all these instantaneous costs, we obtain 
the overall cost function of the problem 

T-l 

J(x, u, w) := L t (xt,u t , wt+i) + K(xt), 
t=o 

and the decision maker has to choose the control variables U t in order to minimize 
the overall cost expectation 

J(X,U) :=e(J2 L t(X t ,U t ,W t+1 ) + K(X T )) . (2.3) 
^ t=o ' 

2.1.3. Information structure. Generally speaking, the information available 
on the system at time t is modelled as a random variable Y, : O — > Y(, where Y f := 
W* y t is the observation space. Let y t be the set of random variables taking their values 
in Yj. We suppose that there exists an observation mapping h t : Wo x • • ■ x Wt — > Y t 
such that Y t = h t (W Q , W T ), P-a.s.. For all t = 0, . . . , T, we denote by St = cr(Y t ) 
the sub-er- field of A generated by the random variable 

9t = <j(h t (W Q ,...,W T )). 

The decision maker knows Y t when choosing the appropriate control U t at time 
t, so that the information constraint is U t ^ Y t . Using the notations Uf 110 = 
{U t 6 U t , U t is S t -measurable} and U mc = Uq w x • ■ • x U^lx, the measurability 
constraints of the problem writes 

U t EUr, Vt = 0,...,T-l. (2.4) 

We also introduce the sequence of cr-fields (3 r t) 4 _ Q T associated with the noise 
process W , where $t is the a- field generated by the noises prior to t: 

5 t = *{W ,...,W t ), Vt = 0,...,T. 

This sequence is a filtration as it satisfies the inclusions Jo C ?i C . . . C $t C A. 
When 9t =9^5 we are in the case of complete causal information. 

4 Note that the tr-fields St's are "fixed", in the sense that they do not depend on the control 
variable U . 
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2.1.4. Optimization problem. According to the notation given in the previous 
paragraphs, the stochastic optimal control problem we consider here is 



min e( y2 L t(X v U v W t+1 )+K(X T )), 



(2.5a) 



-T-l 

in E( 

(U,X) 

subject to both the dynamics constraints (|2.2p and the control constraints 

U t e n Ur c , Vf = 0, . . . , T - 1. (2.5b) 

We follow here the algebraic interpretation of an optimization problem given in m.2[ 
as the information structure is defined using a measurability relation. 

In the way we modelled Problem (|2.5p . we have to minimize the cost function 
J(U , X) with respect to both U and X under the dynamics constraints (|2.2p . But the 
state variables X t are in fact intermediary variables and it is possible to eliminate them 
by recursively incorporating the dynamics equations (|2 . 2[) into the cost function (|2.3|) ■ 
The resulting cost function J only depends on the control variables U t . Its expression 
is J(U) =E(j(U,W)), with 

T-l 

j(u,w) = ^ L t( u o, ■ ■ ■ ,u t ,w , ■ ■ ■ ,m+i) + K(u Q , . . .,UT-1,W , ■ ■ .,wt)- 

t=0 

In this setting, Problem (|2.5[) is equivalent to 

min J(U) s.t. UeW s r\U mc . (2.6) 

U&J 

The derivatives of the cost function J can be obtained from the derivatives of 
J using the well known adjoint state method. This is based on the following result 
stated here without proof. 

Proposition 2.1. Assuming that junctions ft and L t are continuously differen- 
tiable with respect to their first two arguments and that function K is continuously 
dijferentiable, the partial derivatives of j with respect to u are given by 

UYuti^w) = (L t )' Ut {x t ,ut,wt+i) + Xj +1 {f t y ut (xt,u t ,w t+ i), 

where the state vector (xo, ...,Xt) satisfies the forward dynamics equation 

x = w , x t+1 = f t (xt,u t ,w t+ i), 

whereas the adjoint state (or co-state) vector (Ao,...,Ay) is chosen to satisfy the 
backward dynamics equation 

At =K' t (x t ), A t = (L t )'J (xt,u t ,w t+ i) + {f t )'J (x t ,u t ,w t+1 )X t+ i. 



2.1.5. Assumptions. In order to derive optimality conditions for Problem (|2.5|) , 
we make the following assumptions. 

Assumption 1 (Constraints structure). 

Tj S are closed convex set-valued mappings, Vi = 0, . . . , T — 1. 
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Assumption 2 (Differentiability). 

1. Functions ft (dynamics) and Lt (cost) are continuous differentiable with re- 
spect to their first two arguments (state and control), Vi = 0, . . . , T — 1. 

2. Function K (final cost) is continuously differentiable. 

3. Functions L t and ft are normal integrands, Vt = 0, . . . , T — 1. 

4- The derivatives of ft, L t and K are square integrable, Vt = 0, . . . , T — 1. 

Assumption 3 (Nonanticipativity and measurability) . 

1. St cJ t , Vt = 0,...,T. 

2. Mappings T^ s are St-measurable, Vt = 0, . . . , T — 1. 

Assumption [2] will allow us all integration and derivation operations needed for 
obtaining optimality conditions. The measurability condition on in Assumption^ 
expresses that the almost-sure constraints must at least have the same measurability 
as the decision variables. The first condition in Assumption [3] expresses the causality 
of the problem: the decision maker has no access to information in the future! Under 
this assumption, there exists a measurable mapping h t : Wo x ■ ■ • x Wt — > Yt such 
that the information variable Y t writes 

Y t = h t (W ,...,W t ), P-a.s.. 

From Assumption [T] and using Lemma 11.31 the following property is readily avail- 
able. 

Proposition 2.2 (Constraints structure). For all t = 0, . . . ,T — I, 

1. U^ s is a closed convex subset of hit, 

2. proj M „(W t mo )cW t m °- 

2.2. Optimality conditions in stochastic optimal control problems. We 

present here necessary optimality conditions for the stochastic optimal control prob- 
lem (|2.5p . which are an extension of the conditions given in Theorem II .41 

2.2.1. Non-adapted optimality conditions. A first set of optimality condi- 
tions is given in the next theorem. 

Theorem 2.3. Let the two random processes (X t )t=o,—,T £ % an d (U t )t=o,...,T-i 
^zhi be a solution of Problem (|2.5p . Suppose that A ssumption [7J [D and [5| are satisfied. 
Then, there exists a random process {\)t=o,...,T G X such that, for all t — 0, T— 1. 



X Q = W , (2.7a) 

X t+1 = f t (X v U t ,W t+1 ), (2.7b) 

A T = A' T (X T ), (2.7c) 

A t = (L t )'J(X v U v W t+1 ) + (ft)'J (X t ,U t ,W t+1 )X t+v (2.7d) 



E((L t )' u (X v U v W t+1 ) + \J +1 (f t )' u (X v U v W t+1 ) I St) e -d Xur {U t ). (2.7e) 

Proof. From the equivalence between Problem (|2.5p and Problem (|2.6I) . we obtain, 
using Theorem II .4i that the solution U satisfies 



proj Wm „ (J'(U)) e-dXu-iU), 
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and therefore E(j' Ut (U, W) | St) £ -#X„a B (U t ) for all t = 0, . . . , T - 1. The desired 
result follows from Proposition ^. II □ 

The conditions given by Theorem l2 . 31 are called non-adapted optimality conditions 
because the dual random process (\)t=o,...,T is not adapted to the natural filtration 
(?t)t=o,...,T, that is, A t generally depends on the future. We will see in the next 
section that similar optimality conditions can be written with help of an adapted 
dual random process. 

2.2.2. Adapted optimality conditions. The following theorem presents op- 
timality conditions involving an adapted dual random process. 

Theorem 2.4. Let the two random processes (X t ) t =o....,T S X and (U t )t=o T-i 

G U be a solution of Problem (|2.5[) . Assume that Assumption^ fj| an<i[]5] are satisfied. 
Then, there exists a process (A t ) t= o,...,T € X adapted to the filtration (9 r t)t=o,...,T such 
that, for all t = 0, T — 1, 

X Q = W 01 (2.8a) 
X t+1 = f t (X v U v W t+1 ), (2.8b) 

A T = K' T {X T ), (2.8c) 
A t = E((L t yJ (X tl U tl W t+1 ) + (f t )'J (X tl U tl W t+1 )A t+1 | J t ), (2.8d) 

E((L t y u (x t ,u t ,w t+1 ) + Aj +1 (f t y u (x v u v w t+1 ) | s t ) e -d Xur (u t ). (2.8e) 

Proof. All assumptions of Theorem 12.31 are met, so that there exists a random 
process (A t )t=o,...,T satisfying (|2.7p . Define for all t the random variable A t by 

A t :=E(A t | J t ). 

By construction, the process (A t )t = o l ...,T is adapted to the filtration (3 : t)t=o,...,T- At 
stage T, we have 

A T = E(A T | 9 T ) = E(X' T (X T ) | 3 T ) = if /T (X T ), 

because X T is 5F T -measurable. For all t = T — 1, . . . , 0, using the law of total expec- 
tation E(- | ft) — E(E(- I 5t+i) | ?*) and since all variables X v U t and W t+1 are 
Jt+i -measurable, we deduce from (|2.7|) that 

A t = E(M T (^,«7,,^ +1 ) + (/ t )^ T (X t ,[/ t ,W t+1 )E(A t+1 | ? t+1 ) | J t ), 

= E((^)^ T (^,«7,,^ +1 ) + (/ t )^ T (X t ,(7 t ,W t+1 )A t+1 | J t ), 

hence the adapted backward dynamics equations given in (|2.8p . 

Assumption [3] implies that 9 t C J t C J t+ i. Using E(- | St) = E(E(- | 3=i+i) | 3 t ), 
and the measurability properties of X t , U t and W t , the last optimality condition 
in (|2.7[) becomes 

E((i t )L(^,^ I ^ t+ i)+ IE ( A t r + i I ^+i)(/*)L(^,^,^ +1 ) | S t ) e -9 Xur (^ t ). 

hence the last optimality condition given in (|2.8|) . □ 

Note that in the optimality conditions given in Theorem 12.41 at each stage t, 
the gradient is projected over the subspace generated by the observation a- field St, 
whereas the adapted dual random variable is projected over the subspace generated 
by It which corresponds to the natural filtration of Problem (|2.5p . 
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2.3. Optimality conditions in the Markovian case. As far as information 
structure is concerned, the optimality conditions (|2.7p and (|2.8p were obtained assum- 
ing only nonanticipativeness (see Assumption [3]) and the fact that the observation a- 
fields (St)t=o,...,T do not depend on the decisions (U t )t—o,...,T-i- The last assumption 
allows us to avoid the so-called "dual effect of control" (see [3] for further details). 
Another feature often available in practice for the information structure is the "per- 
fect memory" property, which intuitively means that the information is not lost over 
time. The last property implies that (St)t=o,...,T is a filtration, namely 

St C St+i, Vt = 0,...T-l. 

We will assume in the sequel that the perfect memory property holds, and we will 
moreover assume complete causal noise observation, so that St = 5 t , Vt = O,...,T0 
Then both optimality conditions (|2.7p and (|2.8[) involve conditional expectations with 
respect to a random observation variable the dimension of which increases with time 
(a new noise random variable becomes available at each stage t). This leads to a 
computational difficulty, of the same nature as the so-called curse of dimensionality. 
To address this difficulty, it would be an easier situation to have a constant dimension 
for the observation space, as in the stochastic dynamic programming principle [5J 
[5] when the optimal control at t only depends on the state variable at the same 
time stage. We thus consider new (more restrictive) assumptions which match the 
stochastic optimal control framework an lead us to the desired situation. 

Assumption 4 (Markovian case) . 

1. Si = 9^, Vt = 0, . . . , T (perfect memory and causal noise observation). 

2. The random variables W , . . . , W T are independent (white noise). 

3. The mappings T^ s are constant (deterministic constraints): 

v* = o,...,t-i, arf cu (1 rf(w) = rf, v- a . s .. 

The standard formulation of a stochastic optimal control problem in the Marko- 
vian case is to assume that the state is completely and perfectly observed. The 
problem formulation is accordingly 

min e( V L t (X t ,U t ,W t+1 )+K(X T )\ (2.9a) 

' v t=0 ' 

subject to both the dynamics constraints (|2.2p and the control constraints 

U t e rf , P-a.s. and U t <X t , Vt = 0, . . . , T — 1. (2.9b) 

We now consider the optimality conditions (|2.7[) and (|2.8p and we specialize them 
to the Markovian case. 

2.3.1. Markovian case: non-adapted optimality conditions. We present 
a non-adapted version of the optimality conditions of Problem (|2.5p with Markovian 
assumptions. We begin by presenting a result inspired by the stochastic dynamic 
programming principle. 

Theorem 2.5. Suppose that Assumptions^^ and^are fulfilled, and assume that 
there exist two random processes (U t )t=o,...,T—i S U and (X t ) t =o t £ X solution 



5 Note that complete causal noise observation implies the perfect memory property, as far as 
(9^)^—0 ... t is a filtration. 
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of Problem (|2.5p . Then there exists a process (\)t=o,...,T-i £ X satisfying (|2.7p and 
such that, for all t = 0, . . . , T — 1, 

(a) A t+1 ^ (X t+1 ,VF f+2 ,...,W T ), 
(6) C7 t d X t) 

(c) E((L t )'u(^ t ,C/ t ,^ t+1 ) + A t r +1 (/ t );(X t ,i7 4 ,^ +1 ) | 5*0 - 

E((L t ): t (X t ,J7 t ,Ty m ) + A t T +1 (/ t )L(X t! (7 t ,W t+1 ) | X t ), P-a. S .. 

Proof. First, Assumption [3] being implied by Assumption [4J the existence of the 
process (A t ) t=0 ,...,T G X satisfying (|2.7p is given by Theorem l2.3' Denoting by i? t the 
Hamiltonian at time t, namely H t {x : u, w, A) = L t (x, u, w) + A T ft{%, u, w), optimality 
conditions (|2.7d[) and (|2.7ep write 

A t = (H t yJ(X tl U v W t+V A t+1 ), (2.10a) 

E((H t )' u (x v u t ,w t+1 ,x t+1 ) I y t ) e -dx^iUt). (2.10b) 

The proof of statements (a) and (6) is obtained by induction. For the sake of 
simplicity we first prove the result when = U, so that (|2.10b[) reduces to the 
equality condition: 

E{(H t y u (X t , U t , W t+1 , X t+1 ) I J t ) = 0. (2.10c) 

• At stage T, we know from (|2.7c[) that A T = /mt_i(X t ), with (Pr-i being a 
measurable functionjf] and hence 

A T < X T . (2.11a) 

Then using (|2.7b|) , the optimality condition (|2.10cp takes the form: 

E((H T _ 1 )' u (X T _ 1 ,J7 T _ 1 ,W T , M T-io/T-i(X T _ 1 ,t7 T _ 1 ,W T )) | J T -i) =0. 

X T _ 1 and U T1 being both S^-i-measurable random variables, and W T 
being independent of 9t—i (white noise assumption), we deduce that the 
conditional expectation in the last expression reduces to an expectation. Let 
Gt-i denotes the function resulting from its integration, namely 

G T -i(x,u) = E^(i7 T _i)^(x,u, W T ,(j, T -i o/t-i(z,u, W t )) 

Gt-i is a mesurable mapping^ and the optimality condition writes 

Gt-i{X t _ 1 ,U t _ 1 ) =0. 

Using the measurable selection theorem available for implicit measurable 
functions [H Theorem 8]@ we deduce that there exists a measurable map- 
ping jt-i ■ Xt-i — > Ut-i such that Gt_i(X t _ 1 ,7t_i(X t _ 1 )J =0. As a 



6 In fact, /^t-i = ^ /T - From Assumption [2] ut-i is a continuous mapping. 

7 in fact a continuous one (from Assumption [2} 

8 See for instance 1211 Section 7] for a survey of measurable selection theorems corresponding to 
the implicit case. Note that |14l Theorem 8] needs a particular assumption concerning the (r-ficld 
equipping Xy_i. We assume here that such assumption holds. 
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conclusion, the control variable U T _ 1 = jj<-i(X T _ 1 ) satisfies the optimality 
condition (|2. 10c[) at t = T — 1 and is such that 

U T _ X < X T _ V (2.11b) 

At stage t, assume that A t+1 < (X t+1 , W t+2 , . . . , W T ). Then there exists a 
measurable function /i t such that 

\ t+1 = » t (X t+1 ,W t+2 ,...,W T ). (2.12a) 

The optimality condition (|2.10c[) at stage t takes the form 



E 



((H t y u (x t , u t ,w t+v ^(ux t , u t , w t+1 ), w t+2 , ...,w 7 



5t = 0. 



With the same reasoning as at stage T, we deduce that this conditional ex- 
pectation reduces to an expectation, so that the optimality condition writes 

G t (X t ,U t ) = 0, 

Gt being a measurable function given by 

G t (x, u) = E((H t )' u (x, u, W t+vfH (f t (x, u, W t+1 ), W t+2 , ...,W T 

Using again [TJ] Theorem 8], we deduce that there exists a measurable map- 
ping 7 t : X t — » Ut such that U t = 7t(X t ) satisfies the optimality condition 
(|2.10cp at t. We have accordingly 

U t < X t . (2.12b) 
Ultimately starting from the optimality condition ()2.10a[) . namely 

x t = (H t yj(x t ,u t ,w t+1 ,x t+1 ), 

using the induction assumption (|2.12a[) together with (|2. 12b[) and (|2.7c[) . we 
obtain that 

A t = (H t )'J (x t , jt(X t ), W t+1 , fi t (f t (X t , lt (X t ), W t+1 ), w t+2 , . . . , w 7 

We conclude that A t < (X t , W t+1 , . . . , W T ) so that the desired result holds 
true 1^| 

Let's go now to the general case U^ s C U. From (|A.2[) . the optimality condition 
(|2.10b[) is again equivalent to an equality condition: 



proj wr {U t - eE((H t )' a (X v U v W t+V \ +1 ) | 5t)) -U t =0, 

and the same arguments as in the previous case remain valid. 

At last, from U t ^ X v A t+1 < [X v W t+1 , . . . , W T ) and the white noise assump- 
tion, we deduce that A t+1 depends on J t only through X v so that 

E((L t y u (x t ,u t ,w t+1 ) + \J +1 (f t y u (x t ,u t ,w t+1 ) | = 

E{(L t y a (X t ,U t ,W t+1 ) + \J +1 (f t y u (X t ,U t ,W t+1 ) I x t ). 



'Note that we obtained as an intermediate result that A t+1 ^ {X t , W t+1 , . . . , W T ). 
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Statement (c) is thus satisfied and the proof is complete. □ 

In the Markovian case, the optimal solution of Problem (|2.5|) (measurability with 
respect to all past noise variables) satisfies the measurability constraints of Prob- 
lem (|2.9[) (measurability with respect to the current state variable). The two problems 
are equivalent (same min and argmin). In fact, the feasible set of (|2.5p contains the 
feasible set of (I2.9P , and Theorem 12.51 shows us that any optimal solution of (|2.5[) is 
feasible for (|2.9p . and is therefore also optimal also for (|2.9p . Ultimately, the optimal- 
ity conditions of Problem (|2.9[) can be written as 

X = W Q , (2.13a) 
X t+1 =f t (X v U v W t+1 ), (2.13b) 



\ T = K' T (X T ), (2.13c) 
X t = (L t )Z(X t ,U t ,W t+1 ) + (f t y x T t (X t ,U v W t+1 )\ t+1 , (2.13d) 

H(L t y a (X t ,U v W m ) + ^f t Y v {X v U v W m ) | X t ) e -d Xur {U t ). (2.13e) 



Remark 3. Let {GY)t~a t—i an d {GT)t—o,...,T-i fre the gradient processes as- 
sociated with (|2.7jl and ()2 . 1 3[) respectively: 

G) :=E((L t yj(X t ,U v W t+1 ) + (f t )'J(X v U v W t+l )X t+l | <F t ), 
G? :=E((L t )L T (X 4 ,J7 4 ,W t+1 ) + (/ f )' u T (X t ,C/ ( ,W f+1 )A f+1 | X t ). 

Unlike Problem \2.5[ we are unable to compute the optimality conditions (|2 . 1 3[) o/ 
Problem f|2 . 9|> &?/ differentiating the Lagrangian junction, because the conditioning 
term X t depends itself on the control variables U t . Consequently, G\ is not claimed 
to represent the projected gradient of Problem (|2.9p . The equality between the gradient 
G\ and G\ holds true only at the optimum. 

2.3.2. Markovian case: adapted optimality conditions. We now present 
the adapted version of the optimality conditions of Problem (|2 . 5[) under Markovian 
assumptions. 

Theorem 2.6. Suppose that Assumptions]^ and[^] are fulfilled, and assume 
that there exists two random processes (JJ t )t=o,...,T-i € U and (-X" t )t=o,...,T-i € X 
solution of Problem (|2.5p . Then there exists a process (A 4 ) t= o j ....T-i G X satisfying 
(|2.8p and smc/i i/iai, /or all t = 0, . . . , T — 1, 

(a) A t+1 r< X t+1 , 

(b) U t <X v 

(c) E((L t )' u (X v U v W t+1 ) + \J +1 (f t y u (X v U v W t+1 ) | J t ) = 

E((L t )U^,^,^ +1 ) + A t r +1 (/ 4 );(X ( ,[/ t ,W t+1 ) | X f ), P- a . s .. 



Proof. The proof follows the same scheme as the one of Theorem 12.51 We just 
point out that A t+1 ;< (X t+1 , W t+2 , ■ ■ ■ , W T ) and A t+1 = E(A (+1 | CTt+i) implies 
that A t+1 < X t+V □ 
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From the equivalence between Problem (|2.5p (measurability with respect to the 
past noise) and Problem (|2.9[) (measurability with respect to the state) in the Marko- 
vian framework, we may consider that the optimality conditions of Problem (|2.9|) in 
the adapted version are 

X Q = W , (2.14a) 
X t+1 = f t (X t ,U t ,W t+1 ), (2.14b) 

A T = K' T (X T ), (2.14c) 
A t =¥.{{L t )'J{X v U tl W t+1 ) + Ut)'J{X tl U t ,W t+1 )A t+1 \ X t ), (2.14d) 

EiiL^X^W^+Al^UX^W^) | X t ) e -d Xur {U t ). (2.14e) 



The optimality conditions (|2.13p and (12. 14[) involve conditional expectations with 
respect to the state variable the dimension of which is, in most cases, fixed, that is, it 
does not depend on the time stage. In order to solve Problem (|2.5[) (and equivalently 
Problem (|2.9p ). we have to discretize those conditions and in particular to approximate 
the conditional expectations. The literature on conditional expectation approxima- 
tion only offers biased estimators with an integrated squared error depending on the 
dimension of the conditioning term. On the contrary, approximating an expectation 
through a Monte-Carlo technique involves non-biased estimators the variance of which 
does not depend on the dimension of the underlining space. In the next section, we 
propose a functional interpretation of the stochastic optimal control problem in order 
to get rid of those conditional expectations and deal only with expectations. 

2.4. Optimality conditions from a functional point of view. Consider 
the stochastic optimal control Problem (|2.5p . Under Markovian assumptions, we have 
shown in q2.3l that it is equivalent to Problem (|2.9|) and that (|2.14|) is a set of necessary 
optimality conditions. Hereafter we transform the optimality conditions (|2. 14[) using 
Theorem 12.61 and the functional interpretation of the measurability relation between 
random variables (see Proposition [Ll]) . 

Theorem 2.7. Suppose that Assumptions^ [H andj^] are fulfilled, and assume 
that there exist two random processes (U t )t=o,....T-i S U and (X t )t=o,...,T—i G X 
solution of Problem (|2 . 5[> . Let {A\=Q,...,T-l £ X be a random process satisfying the 
optimality conditions (j2.8|) . Then there exists two sequences of mappings (A f )t=o,... t 
and (4>t)t=o,....T-i, At : Xt — » X t and <f> t : Xt — * Vt, such that for all t — 0, . . . , T — 1, 

A t = A t (X t ), (2.15a) 
U t = UX t ), (2.15b) 

A t (-) =E((L t )i T (-,<M-), W t+1 ) + (f t )'J (-,M-),W t+1 )A t+1 (/*(-, M-),W t+1 ))), (2.15c) 

E((L t y u (;M-),W m ) +Aj +1 (f t (-,M-),w m ))(ft)' K (-,M-),w m ))e-dx tr 



01 ) 



(2.15d) 



with $f := {fa e i 2 (X 4 ,S° t ,Px t ;Ut) 5 <h( x ) e r f) Va; e X «} and p x t the prob, 
bility measure associated with X f . 



a- 
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Proof. By Theorem 12. 6[ the random processes (t/.) 4=0 ... x— 1 (-^*)t=o,...,T and 
(A.) i= o ) ... i r_i are such that U t ^ X t and A. ^ X r By Proposition ll.il there exist 
measurable mappings <j) t : X t Ut and A f : X t — > X t such that U t = <f>t{X t ) and A t = 
A t (X t ). From t/ t G U t = L 2 (fl,A,¥;V t ), we deduce that <p t G L 2 (X t , S| t , P Xt ; U t ): 

/||t7>)|| 2 dP(a;)= / ||^(X t (o;))|| 2 dP(a;)= / ||^(a;)|| 2 dP x (x) < +oo. 

Since U t G W f as & <j> t {X t {d)) G rf , P-a.s., we obtain that ^ G $f . 
The co-state dynamic equations in (|2.14[) rewrites 

A t (X t ) = E((L t )7 (X u MXt),W t+1 ) + tftYJ (X t ,MXt),W t+1 )A t+1 (Xt+i) j x t ), 

which, using X t+1 = / t (X t , (j>t(X t ), W t+1 ), becomes 

+ (/O; T (X t ,^(X t ),W t+1 )A t+1 (/ t (X t ,0 t (X t ),W t+ i)) | X t ). 

From the white noise assumption, the random variable W t+1 is independent of X v 
so that the conditional expectation turns out to be just an expectation with respect 
to the noise random variable. The co-state dynamics equation writes accordingly as 
a functional equality: 

A t (-) = E((L t )'J (;M-),W t + 1 ) + (f t )' x T (;M-),W t+1 )A t + 1 (f t (; & (•), W t + l))) • 

Using similar arguments, we easily obtain from the last condition in (|2.14[) : 

E((L t )U-,0 t (-),w t+1 )+A t T +1 (/ t (-,0 t (-),^+i))(/OU-,^(-) I ^ t +i)) e -dxf(4>t), 

which completes the proof. □ 

Theorem 12.71 provides the new functional optimality conditions ()2.15j) for Prob- 
lem (I2.5[) in the Markovian case. These optimality conditions do not involve con- 
ditional expectations but just expectations. Therefore, we may hope that, in the 
approximation process of these conditions, we will obtain non biased estimates the 
variance of which will not depend on the dimension of the state space. 

3. Adaptive discretization technique. In this section we develop tractable 
numerical methods for obtaining the solution of Problem (|2.5jl . We will limit our- 
selves here to the Markovian framework, but methods for all cases described in iJ5]can 
be found in [9]. We briefly discuss two classical solution methods, namely stochastic 
programming and dynamic programming. Then we present an adaptive mesh algo- 
rithm which consists in discretizing the optimality conditions obtained at §2.41 and in 
using them in a gradient-like algorithm. 

3.1. Discrete representation of a function. As far as numerical resolution 
is concerned, we need to manipulate functions which are infinite dimensional objects 
and which, in most cases, do not have a closed-form expression. Thus we must have 
a discrete representation of such an object. Let : X — > U be a function. We suppose 
that we have at disposal a fixed or variable grid x in X, that is a collection of elements 
in X: 



x = (£%!,...,„ G X". 
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A point x % will be called a 'particle. In order to obtain a discrete representation of (j>, 
we define its trace, that is a grid u on U which corresponds to the values of <f> on x: 

«=(^)) i=1 

We denote by Tu : U x — > X™ x V n the irace operator which, with any function </> : X — > 
U, associates the couple of grids (a^u), that is the n points (x l , 4>(x z )) eXxU. On 
the other hand, knowing the trace of <p, we need to compute an approximation of the 
value taken by </> at any i£l Let : X" x U n — » U x be an interpolation-regression 
operator which, with any couple of grids (x, it), associates : X — * KJ representing the 
initial function. Such an interpolation-regression operator may be defined in different 
ways (polynomial interpolation, kernel approximation, closest neighbor, etc). 

3.2. Stochastic programming. One way to discretize stochastic optimal con- 
trol problems of type (|2.5|) is to model the information structure as a decision tree. 
1. First, simulate a given number N of scenarios (W^) t ~Q , '"' T of the noise 
process. Then, by some tree generation procedures (see e.g. fTB] or [J2]), 
organize these scenarios in a scenario tree (at any node of any time stage t, 
there is a single past trajectory but multiple futures: see Figure [XJ)- 



t=0 t=l t=2 t=3 t=T t=0 t= i t=2 t=3 t=T 

N scenarios Scenarios tree 

Fig. 3.1. From scenarios to a tree 

2. Second, write the components of the problem (state dynamics and cost expec- 
tation) on the scenario tree (note that the information constraints are built-in 
in such a tree structure). 
Then the approximation on the scenario tree of Problem (|2.5[) is solved using an 
appropriate (deterministic) non-linear programming package. The optimal solution 
consists of state and control particles at each node of the tree. An interpolation- 
regression procedure (as suggested at &I3.1|) has to be performed at each time stage in 
order to synthesize a feedback law. 

Such a methodology is relatively easy to implement and need not in fact any 
Markovian assumption. Nevertheless, it faces a serious drawback: at the first time 
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stages (close to the tree root), only a few particles — or nodes — are available, 
and experience shows that the estimates of the optimal feedback they provide has a 
relatively weak variance (because the number of pending scenarios at each such node 
is still large enough); on the contrary, at the final time stages (near the tree leaves), 
a large number of particles is available, but with a huge variance. In both cases, the 
feedback synthesis (the interpolation-regression process from the available grid) will 
be inaccurate. Such an observation will be highlighted later on the case study SJU 

3.3. Stochastic dynamic programming. We consider Problem (|2.5[) in the 

Markovian case, which is thus equivalent to Problem (|2.9p . From Theorem 12.51 the 
optimal control process (U t )t=o,...,T-i can be searched as a collection of feedback laws 
{4>t)t=o,...,T-i depending on the process state (-X" t )t=o,...,T ) that is 

U t = (f> t (X t ), P-a.s.. 

According to the Dynamic Programming Principle, the resolution is built up 
backward from t = T to t = by solving the Bellman equation at each time stage for 
all state values Such a principle leads to the following algorithm pjHH]. 

Algorithm 1 (Stochastic Dynamic Programming) . 

• At stage T, 

define the Bellman function Vt as 

V T (x) := K{x), \fx e X T . 

• Recursively for t = T — 1, . . . , 0, 

compute the Bellman function Vt as 

V t {x) = mm e(l 4 (x, u, W t+l ) + V t+1 (f t {x, u, W t+1 ))), Vs £ X f , 
the optimal feedback law 4>t being obtained as 

4>t(x) = arg min E[L t {x,u,W t+1 ) + V t+ i(ft{x,u,W t+1 ))), Vx 6 X t . 

This algorithm is only conceptual because it operates on infinite dimensional 
objects Vt (and expectations cannot always be evaluated analytically). We must 
indeed manipulate those objects as indicated at tj3.ll For every t = 0, . . . , T, let 
x t := {x\)i=\,... t n t be a fixed grid of n t discretization points in the state space X t . We 
approximate the functions appearing in Algorithm [T] by their trace over that grid: 

vi = V t (xi), V* = l,...,nt, Vt = 0,...,T, 
u\ = (t> t {x\), V* = l,...,nt, W = 0,...,T-1. 

We also need to approximate the expectations by the Monte Carlo method. Let 
{Wt)'tZa''"'T d enote N independent and identically distributed scenarios of the ran- 
dom noise process. The discretized stochastic dynamic programming algorithm is as 
follows. 

Algorithm 2 (Discretized Stochastic Dynamic Programming) . 

• At stage T , 

compute the trace vt of the Bellman function Vt-' 



v l T = V T (x T ), Vi = 1, . . . ,n T , 



PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 



19 



Recursively for t = T — 1, . . . , 0, 

approximate V t +\ by interpolation-regression: 

Vt+i = S% (x t+ i,v t+ i) , 
compute the two grids v t and u t , that is, for each i = 1, . . . , nt, 

i N r 

v l = m n £ Lt(xl,u, W t k +1 ) + V t+1 (f t (xl,u, W t k +1 )) 

* k=l L 



AT 

1 

Ut — argmin ■ 

uti * fe=i 



1 N 

-Y 



L t {x\ , u, W t k +1 ) + V t+ x {ft {x\ , u, W t k +1 )) 

and obtain the feedback law as 

<t>t = 9%Ji (x t ,u t ) . 

Remark 4. The interpolation-regression operator is in most cases mandatory in 
Algorithm^ As a matter of fact, for a given time stage t and a given control value 
u 6 Ut, there usually does not exist any index j such that x{ +l — ft(xl,u,W k +1 ), 
which means that Vt+i has to be computed out of the grid Xt+i- Note however that 
the Bellman function at time stage T is known analytically, so that interpolation is 
not needed for Vt . 

This method faces an important difficulty: the curse of dimensionality. In fact, 
one generally discretizes each state coordinate at each time stage t using a scalar grid 
and a fixed number of points. Therefore, the grid x t is obtained as the Cartesian 
product of the scalar grids over all the coordinates. Thus, the number of particles 
of that grid increases exponentially with the state space dimension. This is the well- 
known drawback of most methods derived from Dynamic Programming, which do not 
take advantage of the repartition of the optimal state particles in the state space in 
order to concentrate computations in significant parts of the state space. 

3.4. The adaptive mesh algorithm. Considering the difficulties faced by both 
stochastic and dynamic programming methods, we propose an alternative method 
for solving Problem (|2.5[) in the Markovian case. The method, based on optimality 
conditions (|2.15[) . aims at 

• dealing with the same number of noise particles from the beginning of the time 
horizon to the end: we thus hope that the generated feedback law estimators 
will have a reduced and fixed variance during all time stages; 

• attempting to alleviate the curse of dimensionality by operating on an adap- 
tive discretization grid automatically generated from the primary noise dis- 
cretization grid. 

3.4.1. Approximation. Let us denote by {W^) t ~Q'"'' T a set of N independent 
and identically distributed scenarios obtained from the noise random process. Given 
random control grids u t := {U k )k=i t ... t N for t — 0, . . . ,T — 1, we can compute the 
state random grids x t :— {X k )k=i,...,N by propagating the state dynamics equation: 

X k = W k , Vfc = l,...,A/, (3.1a) 



X k +1 = f t (X k ,U k ,W k +1 ), \/k=l,...,N, W = 0,...,T-1. (3.1b) 
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The feedback at time stage t is obtained using an interpolation-regression operator 
on the grids (x t ,u t ). Note that the state space grids x t are not a priori fixed as in 
Dynamic Programming methods. They are in fact adapted to the control particules, 
as they change whenever the decision maker changes its control strategy. 

Remark 5. // the optimal state is concentrated in a region of the state space, 
the optimal feedback will be synthesized only inside that region. In fact we need not 
compute it elsewhere, because the state will hardly reach other regions. 

In order to obtain approximations A t of the co-state functions A t introduced at 
Theorem 12 .71 we compute particles Aj° by integrating the co-state backward dynamic 
equation over the state grids x t . We thus obtain co-state grids l t = (Aj° ) fe=1 t ... t tf, and 
make use of interpolation-regression operators D\x t in order to compute the co-state 
function for values out of the current grid. More specifically, the process is initiated 
with At(-) = At(-) = K'(-); then, for all i = T — 1, . . . , 0, one computes: 

1 N 

At = - ^(L t )'J(X>°, U k , + (f t )'J(X k , U k ,W? +1 )kJ +1 (f t (X?, U k , W> +1 )) , 

(3.2a) 

A t (-) =£H Xt (a*, !*)(■)■ (3-2b) 



Ultimately, for all k = 1, . . . , N and for alH = 0, . . . , T — 1, the gradient particles 



Gi are obtained as 



1 N 



3=1 



N 

' (L t yj{x k , u t k ,w> +1 ) + (f t yj{x k , u t k , w? +1 )AT +1 (f t (x!,u t k ,wi + 



(3.3) 

As already noticed, the direction associated with these particles represents the gra- 
dient only at the optimum. 

3.4.2. Algorithm. We can now derive a descent-like algorithm to solve Problem 
(12. 5[) under Markovian assumptions. At each iteration, state particles are propagated 
forward — with no interaction between particles — then, co-state particles are prop- 
agated backward — now with interaction caused by the regression-interpolation op- 
erations) . Then, gradient particles are computed using (|3.3|) and the control particles 
are updated using a gradient-like method. Ultimately, a functional representation of 
the feedback laws is obtained thanks to a regression-interpolation operator. 

Algorithm 3. 

• Step [0]. 

Let (u\ )t=o,...,r-i — (u\ I ) t _ ' '"'x—\ ^ e ^ e * n *^ a ^ control grids. 

• Step [£]. 

1. Compute the state grids (x\ )t=o,...,T by propagating the dynamics ()3.1|) 
withU = UW. 

2. Compute both co-state grids (l\ )t=o,...,T and functional approximations 
(A[ £ ') t= o x-x by propagating the dynamics (|3.2p with U — and 

x = xK 

3. Compute the gradient particles [G t ) t -o' t-i us ^ n 9 Equation 
with U = UW, X = and A = A^. 
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4- For all t = 0, . . . , T — 1 and k = 1, . . . , N, update the control particles 
by performing a projected gradient step: 

t/f + ^=pro jlT (^P-/1 G P). 

5. Stop if some degree of accuracy is reached. Else set 1 = 1 + 1 and iterate. 
• Step [oo]. 

1. Set the grids: 

, [ook _ (T M],ksk=l,...,N 

[U t Ji=0,...,T-l - {U t ) t =o,...,T-V 
, [ook _ / [£],fcxfc=l,...,AT 

l^t Ji=0,...,T-l — Jt=o,...,r-r 
Obtain for all t = 0, . . . , T — 1 i/ie feedback law: 

The only a priori discretization made during this algorithm is relative to the noise 
sampling. Once such a discretization has been performed, all other grids used to ul- 
timately obtain the feedback laws are derived by integrating dynamic equations. In 
addition, no conditional expectation approximations are involved in the process. We 
just approximate expectations using Monte Carlo techniques, and it is well-known 
that the variance of such an approximation does not depend on the dimension of 
the underlying space. The only space-size dependent operations are in fact the in- 
terpolation operators used to approximate the co-state mappings during the iterative 
process, plus the feedback laws once convergence is achieved. 

Furthermore, this adaptive discretization makes computations concentrate in ef- 
fective state space regions, unlike dynamic programming which explores the whole 
state space. 

4. Case study. We consider the production management of an hydro-electric 
dam. The problem is formulated as a stochastic optimal control, and we consider 
solving it by the three methods described in 

4.1. Model. The problem is formulated in discrete time over 24 hours using a 
constant time step of one hour. The index t — 0, . . . , T (where T — 24) defines the 
time discretization grid. 

The water volume stored in the dam at time stage t — 0, . . . , T, is a one di- 
mensional random variable X t S L 2 (fl,A, P; K) corresponding to the system state. 
This storage variable has to remain between given bounds x (minimal volume to be 
kept in the dam) and x (maximal water volume the dam can contain) so that, for all 
t = 0, . . . , T, the following almost-sure constraint must hold: 

x < X t < x, P-a.s.. (4.1) 

The water inflow into the dam at stage t is denoted by A t . It is a one dimensional 
random variable with known probability law. We denote by U t 6 L 2 (Cl, A, P; R) the 
one dimensional random variable corresponding to the desired volume of water to be 
turbinated at stage t, and by E t the effectively turbinated water volume during the 
same time stage. In most cases, E t — U t . But this equality is not achievable if the 
dam goes under its minimal volume. Therefore, we have for all t = 0, . . . ,T — 1: 



E t = mm(U t ,X t + A t+l - x), P-a.s. 



(4.2) 
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The shift in the time indices is due to the fact that the decision to turbinate is 
supposed to be made before observing the water inflow volume entering the dam: we 
are in a decision-hazard framework. Moreover, the control variables U t are subject 



to the following bounds: for alii = 0, . . . , T — 1, 

u<U t <% P-a.s.. (4.3) 
Taking into account a possible overflow, the dam dynamics write for t — 0, . . . , T — 1: 
X t+1 = min(X t -E t + A t+V x), P-a.s.. (4.4) 



Note that constraints (|4.ip are fully taken into account in the modelling of the dam 
dynamics. An electricity production P t is associated with the effectively turbinated 
water volume, and it also depends on the water storage (indeed, on the water level in 
the dam, due to the fall height effect): 

P t = g(X t ,E t ). (4.5) 

Let (D t )t=i,...,T denotes the electricity demand, which is supposed to be a stochastic 
process with known probability law. In our decision-hazard framework, production 
P t has to meet demand D t+1 - either P t > D t+1 and the production excess is sold on 
the electricity market, or P t < D t+1 and the gap must be compensated for either by 
buying power on the market or by paying a penalty. The associated cost is modelled as 

c t (D t+1 -P t ). (4.6) 

Ultimately, we suppose that a penalty function K on the final stock X T is given, and 
that the initial condition X Q is a random variable with known probability law. 
Let (W t )t=o,...,T be the noise random process defined as 

W = X , 

W t = (A t ,D t ), Vt = l,...,T. 

We assume that the noises are fully observed in a non-anticipative way, and that the 
control variables are measurable with respect to the past noises. The dam manage- 
ment problem is then the following. 

K (j2 c 4 D t + i-a(X t ,E t ))+K(X T )) 1 (4.7a) 

subject to the constraints (|4.2p - (|4.3p ~ (|4.4p and to the measurability constraints 

U t ±(W ,...,W t ), Vfc = 0,...,T-l. (4.7b) 

It precisely corresponds to the stochastic optimal control problem formulation (|2.5p 
when using the following notations: 

• w = (a, d), 

• L t (x, u, w) — ct(d — g(x, min(u, x + a — x))j, 

• ft({x, u, w) = min (x — min(u, x + a — x) + a, x) 

• Lf = \u,u}. 
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Remark 6. Both equations (|4.2p and (|4.4[) incorporate the non differ entiable 
operator min. We approximate this non-smooth operator by the following operator 
depending on a smoothing parameter c: 

if V <x-c, 
if x — c < y < x + c, 
if y > x + c. 

in order to recover a differ entiable problem. 

4.2. Numerical and functional data. Both electricity demand and water in- 
flows correspond to white noises, obtained by adding a discrete disturbance around 
their mean trajectories. Using the Monte Carlo method, we draw TV = 200 inflow 
trajectories and demand trajectories, which are depicted in Figure [4TT1 and Figure [4~2l 
respectively, the associated particles being denoted (A^) t l ' "' T and (-Df) t _ 1 ' ' T ■ 




Fig. 4.1. Water inflows trajectories Fig. 4.2. Electricity demand trajectories 

The initial state X follows a uniform probability law over \x,x] — [0,2]. We 
also draw TV particles (X* ) fc=1 >— ,N for the initial state and each one is associated 
with the previous trajectories with the same index k to form one scenario among N. 
The control random variables U t are subject for each t — 0, ... ,T — 1 to the bounds 
\u,u} = [0,1]. 

The mapping g modelling the electricity production P t is chosen to represent a 
linear variation between 0.5 and 1 with respect to the water fall height X t — x: 

X t + x — 2x 
The expression of the instantaneous cost Ct is 

c t (y) = T t ( e y - 1), 

where r t is the electricity price at stage t. The variation of this price is depicted in 
Figure WM 

The final cost is an incentive to fill the dam at the end of the day: 



K(x) = \2{x-xf. 
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Fig. 4.3. Electricity price rt 

4.3. Results using Dynamic Programming. We apply Algorithm [5] to Prob- 
lem (|4.7[) using an even discretization of the state space [x,x] in n = 200 points: 

i — 1 ._ 

x i — x H \x — x), vi = l,...,n. 

71—1 

The two operators and 9tu t are linear interpolation operators: to compute the 
value of a function outside the grid, we consider the weighted mean of the two sur- 
rounding grid points. The optimal feedback laws (f> t obtained at each time stage 
t = 0, . . . , T — 1 will be used as a reference in the comparison with the other res- 
olution methods. The optimal cost is obtained by simulating the system using the 
optimal feedback laws over all trajectories: 

c := E(V (X )) = 1 J* V Q (x)dx = 6.48. (4.8) 

4.4. Results obtained by Stochastic Programming. We then make use 
of a stochastic programming technique to solve Problem (|4.7[) . Using quantization 
techniques, we first generate a scenario tree from the N = 200 noises trajectories. We 
will not discuss here the quantization method used to build such a scenarios tree and 
refer to [2] for further details. The resulting tree includes 2 nodes at stage t — 0, 4 
nodes at stage t = 1 and so on till stage t = 6 for which we have 2 6+1 = 128 nodes. 
As 2 8 > 200, the tree structure becomes deterministic as soon as t > 7, each node in 
the tree corresponding to t > 7 having a unique future (Figure [3.11 iust sketches the 
beginning of the story). 

Problem (|4. T|) is then formulated and optimized over the tree: the optimization 
process yields a pair (x v ,u u ) of optimal values for the state and the control at each 
node v of the tree. The next figures depict the optimal pairs of particles at different 
time stages (represented by dots) and the optimal feedback laws obtained by Dy- 
namic Programming (represented by continuous curves). The comparison leads to 
the following conclusions. 

• There are only two nodes corresponding to t = in the scenario tree, and 
therefore only two optimal control particles. These two particles fit the op- 
timal feedback obtained by Dynamic Programming rather accurately (see 
Figure 14. 4[) , but it would be difficult to synthesize a feedback law with such 
a limited number of points. 
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• At stages t — 12 and t — 23, 200 optimal control particules are available. 
Nevertheless these particles have a visible huge variance (see Figures 14.51 and 
14. 6|) , so that it would again be difficult to synthesize a feedback law. 



t - ; number of points - 2 t - 12 ; number of points- 200 




Fig. 4.4. Scenario tree: optimal control (t = 0) Fig. 4.5. Scenario tree: optimal control (t = 12) 



t = 23 ; number of points = 200 




0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 



Fig. 4.6. Scenario tree: optimal control (t = 23) 



4.5. Results with the adaptive mesh method. We ultimately apply Algo- 
rithm [3] to solve the hydro-electric dam management problem (|4.7p . The result of 
this algorithm is an optimal feedback law <p t for every time stage t = 0, . . . , T — 1. 
We then draw new noise trajectories independent from those used by the algorithm, 
and we simulate the system behavior along these new trajectories using the optimal 
feedback laws: 

*o = W , 

X t+1 =f t (X t ,<j> t (X t ),W t+1 ), Vt = 0,...,T-l, 
and thus obtain an approximation of the optimal cost generated by this algorithm: 

c = 6.51 M X v MX t ), W t+1 ) + K(X T )\ . 

^ t=o ' 



This optimal cost is close to the cost generated by Dynamic Programming. We are also 
interested in the controls generated by the adaptive mesh method. To this purpose, 
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Figures |4~T1 14. 8l and l4.9l show the optimal control particles given by the adaptive mesh 
method (dots), to be compared with the optimal feedback laws obtained by Dynamic 
Programming (curves) for different time stages. 




Fig. 4.7. Particle: optimal control (t = 0) Fig. 4.8. Particle: optimal control (t = 12) 




Fig. 4.9. Particle: optimal control (t = 23) 

We note that the optimal particles obtained by the adaptive mesh method are 
close to the feedback laws obtained by Dynamic Programming. By construction, 
there are the same number of particles at each time stage t, and the dispersion of 
the particles remains at first sight constant from the beginning to the end of the 
time horizon. This represents a significant advance compared to the scenario tree 
method. On the other hand, observe that particles may sometimes concentrate in 
restricted parts of the state space (see Figure I^THj) : in our view, this is not a drawback, 
but an advantage of the proposed method in that the optimal feedback is computed 
only where it is needed to do so. Indeed, the particles distribute adaptively and 
automatically according to the optimal probability density of the state (we call this an 
"adaptive mesh" — in Figure l4~7l the distribution is even because the initial condition 
is uniformly distributed), and this is an advantage over Dynamic Programming in 
which a uniform grid is defined a priori over the whole state space, irrespective of the 
optimal solution distribution. 

5. Conclusions and perspectives. In this paper we presented new tractable 
methods for solving stochastic optimal control problems in the discrete time case. We 
derived several forms of the optimality conditions for such problems: 
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• non-adapted optimality conditions (|2.7p as well as adapted optimality condi- 
tions (|2.8[) with measurability constraints on the past noise variables; these 
conditions incorporate conditional expectations and the dimension of the con- 
ditioning random variable grows with the number of time stages; 

• in the Markovian case, non-adapted optimality conditions (|2.13[) and adapted 
optimality conditions (|2.14p with measurability constraints on the state vari- 
able; the conditional expectations are taken with respect to the instantaneous 
state variable whose dimension is constant over the time stages, but the mean 
square error when approximating such conditional expectations depends on 
the state space dimension; 

• still in the Markovian case, functional optimality conditions (|2.15p including 
only expectations. 

The last conditions have been used to devise a gradient-like adaptive mesh algorithm 
in order to solve stochastic optimal control problems in the Markovian case, and we 
applied the algorithm to a hydro-electric dam management problem. 

In light of the numerical results, it is clear that the proposed adaptive mesh 
algorithm represents a significant advance with respect to usual stochastic program- 
ming techniques (same number of particles and same particle dispersion at every time 
stage). In addition, the adaptive mesh feature may save useless computations in 
some problems, depending on the profile of the optimal state probability density. In 
fact, the only a priori discretization concerns noise particles, which does not depend 
on the dimension of the underlining state space: the only operator which could be 
dimension-dependent is the interpolation operator. 

Future work will concentrate on the convergence rate of the mesh algorithm with 
respect to the number N of noise trajectories. We will also deal with stochastic optimal 
control problems involving a multi-dimensional state vector, and try to quantify the 
impact of the interpolation operator on the approximation error. 

Appendix A. Optimization on an Hilbert space: a special case. Let TL 

be an Hilbert space, let TL ie be a closed convex subset of TL and let / be a real valued 
function defined on TL. We consider the following optimization problem: 

min f{x). (A.l) 

In the following, \ H will denote the indicator function of a subset ffcH, namely 

, , f ifarGfl", 
" 1 +oo otherwise. 

The optimization literature gives different expressions for the necessary optimality 
conditions of an optimization problem in a general Hilbert space (see e.g. For 
instance, if x" £ TL is solution of (|A.1[) . then the following statements are equivalent: 

Vx e TL tc , </'(x J ) ,x-x») > 0, (A.2a) 
/'(x»)e-d x , Hfc (x»), (A.2b) 
Ve>0, x J =proj Hf c (x 9 -eV/(x J )) . (A.2c) 

We now consider a specific structure for the feasible set TL . More precisely, we 
assume that TL ic = TL CV (~l TL sp , TL sp being a closed subspace of TL and TL CV being a 
closed convex subset of TL. We moreover assume that the following property holds. 
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Assumption 5. The sets H sp and H cv are such that proj H cv {H sp ) C H sp . 

Then the projection operator on TC lc has the following property. 

Lemma A.l. Under Assumption^ the following relation holds true: 

proj W cv nH s P = proj^cv oproj WS p. 

Proof. Let y e H cv D H sp . Then 

(x - proj^cv (proj Wsp (x)) ,y - proj^cv (proj H s P (x))) = 

(x - proj WS p (x) ,y- proj^cv (proj WBP (x))) + 
(proj-^p (x) - proj^cv (proj wsp (x)) ,y - proj HCT (proj wsp (x))} . 

From the characterization of the projection of z := proj WBp (x) over the convex subset 
W cv , the last inner product in the previous expression is non positive: therefore, 

(x - proj WCT (z) ,y- proj^ov (z)) < (x - z ,y - proj w „ (z)) . 

From proj^cv (H sp ) C H sp , we deduce that y — proj^cv (2) G H sp . Since proj Wap is a 
self-adjoint operator, we have 

(x - proj K ov (z) ,y- proj^ov (z)) < (x - z , proj^ sp (y - proj W ov (z))) , 

< (proj„ sp (x-z) ,y- proj WCT (z)) , 
<0, 

the last inequality arising from the fact that proj Wsp (x — z) = proj Wsp (x)— proj Wsp (z) 
= (since H sp is a linear subspace, then proj Wsp (•) is a linear operator). We thus 
conclude that, for all y G H cv D H sp , 

(x - proj W ov o proj WB p (x) ,y - proj w „ o proj WBp (x)) < 0, 

a variational inequality which characterizes proj^cv o proj Wsp (x) as the projection of 
x over H fc = H cv HH sp . □ 

The following proposition gives necessary optimality conditions for Problem (|A.1[) 
when the feasible set H le has the specific structure Ti cv PI Tt sp . 

Proposition A. 2. We suppose that Assumption [5| is fulfilled and that f is 
differentiate. If x* is solution of IjA.lj) . then 

proj W3P (f'{x*)) G-%„ cv (x»). 

Proof. Let x" be solution of (|A.1|) . Using Condition (]A.2c[) and Lemma \A.1\ we 
obtain that 

x tt = proj^cv o proj Wap (x B - eV/(x J )) , Ve > 0. 
But proj W B P is a linear operator and x" G Ti sp , so that 

x B = proj^ov (x B - eproj„ Bp (V/(x s ))) . 
From (|A.2jl . the last relation is equivalent to proj Wsp (/'(x")) G — <9x„ cv (x"). □ 
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